Add Z Image LoRA fine tuning support#1127
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
I am getting this error while running fine tuning task with any diffusion model on the diffusion trainer plugin: I tried fine tuning SDXL and other stable diffusion models but got this error on every run |
I had this error once, updating timm resolved it. But it may or may not help in your case |
dadmobile
left a comment
There was a problem hiding this comment.
I wasn't able to get this to running. Let's spend some time this week syncing on what's required and then we can do a patch release and announce this!
When trying to generate with Z Image Turbo I kept getting some vague error I will have to debug.
When trying to run the train I would get:
Error in Job: 'FlowMatchEulerDiscreteScheduler' object has no attribute 'add_noise'
Traceback (most recent call last):
File "/home/azureuser/transformerlab-app/api/transformerlab/plugin_sdk/transformerlab/sdk/v1/tlab_plugin.py", line 105, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/azureuser/.transformerlab/orgs/3c33c85b-628a-4ca8-93d3-b657cb7973b2/workspace/plugins/diffusion_trainer/main.py", line 818, in train_diffusion_lora
noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/azureuser/.transformerlab/orgs/3c33c85b-628a-4ca8-93d3-b657cb7973b2/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/diffusers/configuration_utils.py", line 144, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'FlowMatchEulerDiscreteScheduler' object has no attribute 'add_noise'
deep1401
left a comment
There was a problem hiding this comment.
I think we've let this sit long enough that we're getting version errors now. Running this I get errors like these which you might want to look at:
Using default home directory: /home/transformerlab/.transformerlab
Error executing plugin: Could not import module 'BloomPreTrainedModel'. Are this object's requirements defined correctly?
Traceback (most recent call last):
File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 2317, in __getattr__
module = self._get_module(self._class_to_module[name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 2347, in _get_module
raise e
File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 2345, in _get_module
return importlib.import_module("." + module_name, self.__name__)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/transformerlab/.transformerlab/envs/transformerlab/lib/python3.11/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/models/bloom/modeling_bloom.py", line 29, in <module>
from ...modeling_layers import GradientCheckpointingLayer
File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/modeling_layers.py", line 28, in <module>
from .processing_utils import Unpack
File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/processing_utils.py", line 37, in <module>
from .image_utils import ChannelDimension, ImageInput, is_vision_available
File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/image_utils.py", line 55, in <module>
from torchvision.transforms import InterpolationMode
File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torchvision/__init__.py", line 10, in <module>
from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils # usort:skip
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torchvision/_meta_registrations.py", line 163, in <module>
@torch.library.register_fake("torchvision::nms")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torch/library.py", line 1073, in register
use_lib._register_fake(
File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torch/library.py", line 203, in _register_fake
handle = entry.fake_impl.register(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torch/_library/fake_impl.py", line 50, in register
if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: operator torchvision::nms does not exist
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Z-Image pipeline support across training and inference (loading, tokenizer/prompt encoding, FlowMatchSFTLoss, LoRA handling and save paths), bumps diffusion_trainer plugin version, expands installer script, adds model-reference resolution and generation-kwargs filtering for image pipelines, and introduces runtime/config helpers and related tests. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Trainer as Training Pipeline
participant Loader as ModelConfig Loader
participant ZPipe as ZImagePipeline
participant Tokenizer as Z-Image Tokenizer
participant Loss as FlowMatchSFTLoss
participant Saver as LoRA Saver
User->>Trainer: start Z-Image training
Trainer->>Loader: build_zimage_model_configs(model_path)
Loader-->>Trainer: model & tokenizer configs
Trainer->>ZPipe: instantiate pipeline (device / dtype / freeze parts)
ZPipe-->>Trainer: pipeline ready
Trainer->>Tokenizer: encode_prompt_zimage(prompts)
Tokenizer-->>Trainer: prompt embeddings
Trainer->>Loss: forward(batch, embeddings, sizes/crops)
Loss-->>Trainer: loss
Trainer->>Saver: save LoRA (safetensors or fallback)
Saver-->>User: checkpoint saved
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
api/transformerlab/plugins/diffusion_trainer/main.py (1)
570-586: Remove duplicate VAE xFormers enablement.The VAE call is executed twice for non‑ZImage paths. Keep a single guarded call.
♻️ Suggested cleanup
- if hasattr(vae, "enable_xformers_memory_efficient_attention"): - vae.enable_xformers_memory_efficient_attention() - if not is_zimage and hasattr(vae, "enable_xformers_memory_efficient_attention"): + if not is_zimage and hasattr(vae, "enable_xformers_memory_efficient_attention"): vae.enable_xformers_memory_efficient_attention()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/plugins/diffusion_trainer/main.py` around lines 570 - 586, The VAE's enable_xformers_memory_efficient_attention is being called twice in the xFormers enable block; remove the duplicate call so the code only invokes vae.enable_xformers_memory_efficient_attention() once and guard it with hasattr(vae, "enable_xformers_memory_efficient_attention") and the is_zimage check as appropriate (use unet.enable_xformers_memory_efficient_attention() and a single conditional call to vae.enable_xformers_memory_efficient_attention() when available and when not is_zimage).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/transformerlab/plugins/diffusion_trainer/main.py`:
- Around line 62-95: In build_zimage_model_configs, validate the local-path glob
results (transformer_paths, text_encoder_paths, vae_paths, and tokenizer path)
when model_id_or_path is a directory: if any of these lists/paths are empty or
missing, raise a clear ValueError describing which component is missing instead
of letting downstream opaque errors occur; keep using ModelConfig for each
component but fail fast with an explicit message naming the missing asset (e.g.,
"missing transformer files", "missing text_encoder files", "missing vae file",
or "missing tokenizer directory") so callers can immediately act.
- Around line 453-483: When defaulting Z-Image to BF16 in the mixed-precision
logic, guard that choice with an actual hardware BF16 support check: inside the
block that sets weight_dtype based on is_zimage and mixed_precision (referencing
is_zimage, mixed_precision, weight_dtype, device), only set weight_dtype =
torch.bfloat16 if CUDA is available and torch.cuda.is_bf16_supported() returns
True; otherwise fall back to torch.float32 (or respect explicit "bf16" request).
Ensure the check runs before assigning weight_dtype so non-BF16 GPUs/CPUs won't
get bfloat16 by default.
- Around line 491-553: The build currently relies on diffsynth APIs used around
ZImagePipeline.from_pretrained and pipe.scheduler.set_timesteps (seen in
main.py), but setup.sh installs diffsynth without a version pin; update setup.sh
to pin diffsynth to a compatible minimum/locked version (e.g., change the
install spec to diffsynth>=0.X.Y or a specific tested release) so the pipeline
code (ZImagePipeline.from_pretrained, scheduler.set_timesteps, and related
behavior) remains stable across environments.
- Around line 1005-1018: The code references an unreleased class
FlowMatchSFTLoss (imported from diffsynth.diffusion.loss) which isn't available
in public diffsynth v2.0.4; update the codebase and dependency declarations:
either replace FlowMatchSFTLoss usage in main.py (around the is_zimage branch
where input_latents, prompt_embeds, vae_encoder, and encode_prompt_zimage are
used) with a public, supported loss class or vendor the missing implementation,
and then pin and document the exact diffsynth fork/commit or custom package in
requirements.txt or pyproject.toml; also ensure the replacement/venor returns a
PyTorch tensor (compatible with the .item() call) and preserve the
gradient_checkpointing flags (use_gradient_checkpointing and
use_gradient_checkpointing_offload) so runtime behavior remains consistent.
In `@api/transformerlab/plugins/diffusion_trainer/setup.sh`:
- Around line 3-7: Update the PEFT requirement from "peft>=0.15.0" to
"peft>=0.17.0" in the shell install line (replace the existing uv pip install
"peft>=0.15.0" diffsynth command with uv pip install "peft>=0.17.0" diffsynth)
and also adjust the PEFT version constraint in the project-level pyproject.toml
optional dependencies entries so they no longer pin to 0.14.0/0.15.2 but allow
>=0.17.0, ensuring consistency with diffusers 0.36.0 and other diffusion
plugins.
---
Nitpick comments:
In `@api/transformerlab/plugins/diffusion_trainer/main.py`:
- Around line 570-586: The VAE's enable_xformers_memory_efficient_attention is
being called twice in the xFormers enable block; remove the duplicate call so
the code only invokes vae.enable_xformers_memory_efficient_attention() once and
guard it with hasattr(vae, "enable_xformers_memory_efficient_attention") and the
is_zimage check as appropriate (use
unet.enable_xformers_memory_efficient_attention() and a single conditional call
to vae.enable_xformers_memory_efficient_attention() when available and when not
is_zimage).
| pipe = None | ||
| if is_zimage: | ||
| # Ensure the model is downloaded locally if it's not already a directory | ||
| if not os.path.isdir(pretrained_model_name_or_path): | ||
| from huggingface_hub import snapshot_download | ||
|
|
||
| print(f"Downloading Z-Image model {pretrained_model_name_or_path} from Hugging Face...") | ||
| pretrained_model_name_or_path = snapshot_download( | ||
| repo_id=pretrained_model_name_or_path, | ||
| allow_patterns=["*.safetensors", "*.json", "tokenizer/*"], | ||
| ) | ||
| print(f"Model downloaded to: {pretrained_model_name_or_path}") | ||
|
|
||
| # Extract components from the loaded pipeline | ||
| noise_scheduler = temp_pipeline.scheduler | ||
| tokenizer = temp_pipeline.tokenizer | ||
| text_encoder = temp_pipeline.text_encoder | ||
| vae = temp_pipeline.vae | ||
| model_configs, tokenizer_config = build_zimage_model_configs(pretrained_model_name_or_path) | ||
| pipe = ZImagePipeline.from_pretrained( | ||
| torch_dtype=weight_dtype, | ||
| device=device, | ||
| model_configs=model_configs, | ||
| tokenizer_config=tokenizer_config, | ||
| ) | ||
|
|
||
| # Handle different architectures: FluxPipeline uses 'transformer', others use 'unet' | ||
| # We use 'unet' as a unified variable name for the main model component regardless of architecture | ||
| if hasattr(temp_pipeline, "transformer"): | ||
| # FluxPipeline and other transformer-based models | ||
| unet = temp_pipeline.transformer | ||
| model_component_name = "transformer" | ||
| pipe.scheduler.set_timesteps(int(args.get("num_train_timesteps", 1000)), training=True) | ||
| noise_scheduler = pipe.scheduler | ||
| tokenizer = pipe.tokenizer | ||
| text_encoder = pipe.text_encoder | ||
| vae_encoder = pipe.vae_encoder | ||
| vae_decoder = pipe.vae_decoder | ||
| unet = pipe.dit | ||
| model_component_name = "dit" | ||
| text_encoder_2 = None | ||
| tokenizer_2 = None | ||
| vae = None | ||
| else: | ||
| # SD 1.x, SDXL, SD3 and other UNet-based models | ||
| unet = temp_pipeline.unet | ||
| model_component_name = "unet" | ||
| temp_pipeline = AutoPipelineForText2Image.from_pretrained(pretrained_model_name_or_path, **pipeline_kwargs) | ||
|
|
||
| # Extract components from the loaded pipeline | ||
| noise_scheduler = temp_pipeline.scheduler | ||
| tokenizer = temp_pipeline.tokenizer | ||
| text_encoder = temp_pipeline.text_encoder | ||
| vae = temp_pipeline.vae | ||
|
|
||
| # Handle different architectures: FluxPipeline uses 'transformer', others use 'unet' | ||
| # We use 'unet' as a unified variable name for the main model component regardless of architecture | ||
| if hasattr(temp_pipeline, "transformer"): | ||
| # FluxPipeline and other transformer-based models | ||
| unet = temp_pipeline.transformer | ||
| model_component_name = "transformer" | ||
| else: | ||
| # SD 1.x, SDXL, SD3 and other UNet-based models | ||
| unet = temp_pipeline.unet | ||
| model_component_name = "unet" | ||
|
|
||
| # Handle SDXL case with dual text encoders | ||
| text_encoder_2 = getattr(temp_pipeline, "text_encoder_2", None) | ||
| tokenizer_2 = getattr(temp_pipeline, "tokenizer_2", None) | ||
| # Handle SDXL case with dual text encoders | ||
| text_encoder_2 = getattr(temp_pipeline, "text_encoder_2", None) | ||
| tokenizer_2 = getattr(temp_pipeline, "tokenizer_2", None) | ||
|
|
||
| # Clean up temporary pipeline | ||
| del temp_pipeline | ||
| torch.cuda.empty_cache() if torch.cuda.is_available() else None | ||
| # Clean up temporary pipeline | ||
| del temp_pipeline | ||
| torch.cuda.empty_cache() if torch.cuda.is_available() else None | ||
|
|
||
| print(f"Model components loaded successfully: {pretrained_model_name_or_path}") | ||
| print(f"Architecture detected - Model component ({model_component_name}): {type(unet).__name__}") | ||
| print(f"Model components loaded successfully: {pretrained_model_name_or_path}") | ||
| print(f"Architecture detected - Model component ({model_component_name}): {type(unet).__name__}") | ||
| if text_encoder_2 is not None: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find dependency files to see diffsynth version pinning
find . -type f \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "poetry.lock" -o -name "Pipfile" \) | head -20Repository: transformerlab/transformerlab-app
Length of output: 216
🏁 Script executed:
# Search for diffsynth imports in the codebase to understand usage
rg "diffsynth" --type py -i | head -30Repository: transformerlab/transformerlab-app
Length of output: 378
🏁 Script executed:
# Look for the specific file mentioned in the review
head -50 api/transformerlab/plugins/diffusion_trainer/main.pyRepository: transformerlab/transformerlab-app
Length of output: 1472
🏁 Script executed:
# Check the pyproject.toml for diffsynth version
cat api/pyproject.toml | grep -A 50 "dependencies"Repository: transformerlab/transformerlab-app
Length of output: 2066
🌐 Web query:
diffsynth ZImagePipeline from_pretrained API documentation
💡 Result:
diffsynth.pipelines.z_image.ZImagePipeline.from_pretrained is defined in the DiffSynth-Studio repo here: diffsynth/pipelines/z_image.py. It has this signature (as of main):
from_pretrained(torch_dtype=torch.bfloat16, device=get_device_type(), model_configs: list[ModelConfig]=[], tokenizer_config: ModelConfig=ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="tokenizer/"), vram_limit: float=None, enable_npu_patch: bool=True)[1]
Parameter meaning (project docs/examples):
torch_dtype: compute dtype used by the pipeline (not necessarily how weights are stored/offloaded). [2]device: pipeline execution device (e.g."cuda"). [1]model_configs: list ofModelConfigentries describing what to download/load (bymodel_id+origin_file_pattern, or localpath, and optionally VRAM/offload settings). [2][3]tokenizer_config:ModelConfigpointing to the tokenizer folder (default is Z-Image-Turbo tokenizer). [1][3]vram_limit: enables VRAM-management behavior duringdownload_and_load_models(...)(used in official examples as “available VRAM (GiB) minus some margin”). [1][3]enable_npu_patch: applies an NPU compatibility patch during init. [1]
Example usage (Z-Image Turbo quick start): uses model_configs=[ModelConfig(... transformer ...), ModelConfig(... text_encoder ...), ModelConfig(... vae ...)] plus tokenizer_config=ModelConfig(..., origin_file_pattern="tokenizer/") and an optional vram_limit=.... [3]
Related docs pointer: the general “Loading Models” section explains from_pretrained, ModelConfig, remote vs local loading, and env vars like DIFFSYNTH_MODEL_BASE_PATH / DIFFSYNTH_SKIP_DOWNLOAD. [2]
Sources: [1] [2] [3]
🌐 Web query:
diffsynth scheduler set_timesteps training parameter
💡 Result:
In DiffSynth(-Studio) the meaning of the training parameter depends on which scheduler you’re using:
FlowMatchScheduler (diffsynth/schedulers/flow_match.py)
set_timesteps is defined as:
set_timesteps(num_inference_steps=100, denoising_strength=1.0, training=False, shift=None)training=False(default): only buildsself.sigmasandself.timesteps.training=True: additionally computesself.linear_timesteps_weights, which is later consumed bytraining_weight(timestep)for per-timestep loss weighting. (huggingface.co)
So training is not “how many training timesteps”—it’s a flag to enable training-time timestep weights.
Typical usage for training with FlowMatch:
- Call
scheduler.set_timesteps(N, training=True)whereNis the number of discrete timesteps you want to sample from (oftenN=1000to match the model’snum_train_timestepsscale). - Sample an index/value from
scheduler.timesteps, useadd_noise(...), and optionally weight the loss viascheduler.training_weight(timestep). (huggingface.co)
EnhancedDDIMScheduler (DDIM-style training scripts)
Some DiffSynth training code resets the scheduler and calls:
scheduler.set_timesteps(1000)
Here, there is no training= flag involved; it’s just setting the timestep grid length (often to 1000). (gitee.com)
Citations:
- 1: https://huggingface.co/spaces/svjack/FantasyTalking/blob/main/diffsynth/schedulers/flow_match.py
- 2: https://huggingface.co/spaces/svjack/FantasyTalking/blob/main/diffsynth/schedulers/flow_match.py
- 3: https://gitee.com/mirrors/diffsynth-studio/blob/ExVideo/ExVideo_animatediff_train.py?utm_source=openai
🏁 Script executed:
# Check if diffsynth is listed in a separate requirements file for the plugin
find api/transformerlab/plugins/diffusion_trainer -name "*.txt" -o -name "*.toml" | xargs cat 2>/dev/nullRepository: transformerlab/transformerlab-app
Length of output: 59
🏁 Script executed:
# Check for any setup files or configuration in the diffusion_trainer directory
ls -la api/transformerlab/plugins/diffusion_trainer/Repository: transformerlab/transformerlab-app
Length of output: 432
🏁 Script executed:
# Search the entire codebase for where diffsynth dependency might be specified
rg "diffsynth" --type toml --type txtRepository: transformerlab/transformerlab-app
Length of output: 59
🏁 Script executed:
# Check what's in setup.sh - this likely manages plugin dependencies
cat api/transformerlab/plugins/diffusion_trainer/setup.shRepository: transformerlab/transformerlab-app
Length of output: 398
🏁 Script executed:
# Also check if there's any documentation about how plugin dependencies are managed
find . -name "setup.sh" -exec cat {} \; | head -100Repository: transformerlab/transformerlab-app
Length of output: 4920
Pin diffsynth version to ensure API compatibility across environments.
The code correctly uses ZImagePipeline.from_pretrained(device=..., model_configs=..., tokenizer_config=...) and scheduler.set_timesteps(..., training=True), which are valid in current diffsynth versions. However, setup.sh installs diffsynth without a version constraint, whereas peft is pinned to >=0.15.0. Without version pinning, future diffsynth releases could introduce breaking API changes. Add a version constraint (e.g., "diffsynth>=0.X.Y") to setup.sh to guarantee consistent behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/transformerlab/plugins/diffusion_trainer/main.py` around lines 491 - 553,
The build currently relies on diffsynth APIs used around
ZImagePipeline.from_pretrained and pipe.scheduler.set_timesteps (seen in
main.py), but setup.sh installs diffsynth without a version pin; update setup.sh
to pin diffsynth to a compatible minimum/locked version (e.g., change the
install spec to diffsynth>=0.X.Y or a specific tested release) so the pipeline
code (ZImagePipeline.from_pretrained, scheduler.set_timesteps, and related
behavior) remains stable across environments.
| if is_zimage: | ||
| pixel_values = batch["pixel_values"].to(device, dtype=weight_dtype) | ||
| input_latents = vae_encoder(pixel_values) | ||
| prompt_embeds = encode_prompt_zimage(pipe, batch["prompt"], device) | ||
|
|
||
| loss = FlowMatchSFTLoss( | ||
| pipe, | ||
| input_latents=input_latents, | ||
| prompt_embeds=prompt_embeds, | ||
| image_embeds=None, | ||
| image_latents=None, | ||
| use_gradient_checkpointing=args.get("gradient_checkpointing", False), | ||
| use_gradient_checkpointing_offload=False, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find where FlowMatchSFTLoss is defined or imported
rg "FlowMatchSFTLoss" --type py -B 2 -A 2Repository: transformerlab/transformerlab-app
Length of output: 975
🏁 Script executed:
# Also check the specific file to see imports
head -100 api/transformerlab/plugins/diffusion_trainer/main.py | grep -E "^import|^from"Repository: transformerlab/transformerlab-app
Length of output: 895
🏁 Script executed:
# Check if it's defined in the file itself
rg "class FlowMatchSFTLoss" --type py -A 20Repository: transformerlab/transformerlab-app
Length of output: 59
🏁 Script executed:
# Check if diffsynth is vendored or installed
find . -name "*diffsynth*" -type d | head -20Repository: transformerlab/transformerlab-app
Length of output: 59
🏁 Script executed:
# Check requirements files for diffsynth version
find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" | xargs grep -l "diffsynth" 2>/dev/nullRepository: transformerlab/transformerlab-app
Length of output: 61
🏁 Script executed:
# Search for FlowMatchSFTLoss in the entire repo
rg "class FlowMatchSFTLoss" --type pyRepository: transformerlab/transformerlab-app
Length of output: 59
🏁 Script executed:
# Check for any diffsynth submodules or vendored code
ls -la | grep -i diffsynth || echo "No top-level diffsynth directory"Repository: transformerlab/transformerlab-app
Length of output: 109
🌐 Web query:
diffsynth FlowMatchSFTLoss class signature parameters return type
💡 Result:
I couldn’t find any public FlowMatchSFTLoss class in the searchable DiffSynth-Studio sources/docs or the PyPI diffsynth project page (v2.0.4, released Jan 30, 2026). The only clearly indexed “FlowMatch*” API I could locate is FlowMatchScheduler (scheduler implementation), not an SFT loss class. [1], [2]
Because FlowMatchSFTLoss doesn’t appear in public docs/search results, I can’t reliably state its class signature, __init__ parameters, or forward/return type from web sources.
How to get the exact signature from your installed diffsynth:
import inspect
from diffsynth import FlowMatchSFTLoss # or its real import path if different
print(FlowMatchSFTLoss)
print(inspect.signature(FlowMatchSFTLoss.__init__))
print(inspect.signature(FlowMatchSFTLoss.forward))
print(inspect.getsource(FlowMatchSFTLoss.forward))If you paste the import path you’re using (or the traceback showing where FlowMatchSFTLoss is defined), I can tell you the exact parameters and return type from that definition.
Sources:
[1] PyPI project page for diffsynth (v2.0.4, Jan 30, 2026) (pypi.org)
[2] DiffSynth-Studio docs mentioning FlowMatchScheduler (diffsynth-studio.readthedocs.io)
Citations:
- 1: https://pypi.org/project/diffsynth/
- 2: https://diffsynth-studio.readthedocs.io/zh-cn/latest/tutorial/Schedulers.html?utm_source=openai
🏁 Script executed:
# Check if there's a local diffsynth implementation in the codebase
rg "from diffsynth.diffusion" --type py -lRepository: transformerlab/transformerlab-app
Length of output: 129
🏁 Script executed:
# Search for any loss-related classes defined locally
rg "class.*Loss" --type py | grep -i "flowmatch\|sft"Repository: transformerlab/transformerlab-app
Length of output: 59
🏁 Script executed:
# Check if diffsynth.diffusion.loss is implemented anywhere locally
find . -path "*/diffsynth/diffusion/loss*" -type fRepository: transformerlab/transformerlab-app
Length of output: 59
🏁 Script executed:
# Check the git history to see if this was recently added
git log --oneline --all -- "*FlowMatchSFTLoss*" 2>/dev/null | head -10Repository: transformerlab/transformerlab-app
Length of output: 59
🏁 Script executed:
# Check what happens with the 'loss' variable after it's created
sed -n '1005,1050p' api/transformerlab/plugins/diffusion_trainer/main.pyRepository: transformerlab/transformerlab-app
Length of output: 2444
🏁 Script executed:
# Check for requirements to see diffsynth version constraint
find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" | xargs cat 2>/dev/null | grep -i diffsynthRepository: transformerlab/transformerlab-app
Length of output: 59
🏁 Script executed:
# Check if there's a git submodule or fork
cat .gitmodules 2>/dev/null | grep -i diffsynth || echo "No submodule found"Repository: transformerlab/transformerlab-app
Length of output: 95
🏁 Script executed:
# Check for any notes or documentation about diffsynth version
find . -name "*.md" -o -name "*.txt" | xargs grep -l "FlowMatchSFTLoss\|diffsynth.*version" 2>/dev/null | head -5Repository: transformerlab/transformerlab-app
Length of output: 59
FlowMatchSFTLoss is not found in any public diffsynth version—verify dependency.
FlowMatchSFTLoss does not exist in the publicly available diffsynth library (v2.0.4, latest on PyPI). The code imports from diffsynth.diffusion.loss, but this class is undocumented and unreleased. Confirm that your development environment uses a specific diffsynth fork, branch, or unreleased version, and document this dependency in requirements.txt or pyproject.toml to prevent deployment failures. The usage pattern (.item() at line 1019) indicates it should return a PyTorch tensor, but this callable contract cannot be verified without access to the actual implementation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/transformerlab/plugins/diffusion_trainer/main.py` around lines 1005 -
1018, The code references an unreleased class FlowMatchSFTLoss (imported from
diffsynth.diffusion.loss) which isn't available in public diffsynth v2.0.4;
update the codebase and dependency declarations: either replace FlowMatchSFTLoss
usage in main.py (around the is_zimage branch where input_latents,
prompt_embeds, vae_encoder, and encode_prompt_zimage are used) with a public,
supported loss class or vendor the missing implementation, and then pin and
document the exact diffsynth fork/commit or custom package in requirements.txt
or pyproject.toml; also ensure the replacement/venor returns a PyTorch tensor
(compatible with the .item() call) and preserve the gradient_checkpointing flags
(use_gradient_checkpointing and use_gradient_checkpointing_offload) so runtime
behavior remains consistent.
| # Install compatible torch and torchvision first to avoid version conflicts | ||
| uv pip install torch torchvision diffusers transformers --extra-index-url https://download.pytorch.org/whl/cu118 | ||
|
|
||
| # Install PEFT and diffsynth | ||
| uv pip install "peft>=0.15.0" diffsynth |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
torch diffusers peft transformers compatibility requirements 2025
💡 Result:
Practical compatibility requirements (late‑2025 / 2026-era HF stack)
Diffusers
diffusersv0.36.0 is tested on Python 3.8+ and PyTorch 1.4+ and the docs explicitly install it alongsidetransformers(e.g.diffusers["torch"] transformers). [1]
Diffusers ↔ PEFT (LoRA/adapters)
- If you install Diffusers from
git/ dev builds (e.g.,0.36.0.dev0), Diffusers can enforcepeft>=0.17.0; a real-world failure report shows an import error whenpeft==0.7.0is present and Diffusers requirespeft>=0.17.0. [2]
Transformers ↔ PEFT
- Transformers v5 is incompatible with
peft<0.18.0; if you plan to use Transformers v5, usepeft>=0.18.0. [3] - PEFT 0.18.0+ also requires Python 3.10+ (drops 3.9). [3]
Transformers ↔ huggingface-hub (common resolver trap)
- Recent
transformers(example given: 4.57.3, as of 2026‑01‑05) still requireshuggingface-hub<1.0; letting your environment pickhuggingface-hub==1.xcan force downgrades/breakage. [4]
A “safe” 2025-ish pinning strategy (what the above implies)
- If staying on Transformers 4.x: use
diffusers>=0.36,transformers~=4.5x,peft>=0.17, and ensurehuggingface-hub<1.0. [1][2][4] - If moving to Transformers 5.x: use
peft>=0.18.0(and therefore Python 3.10+), plus whatever Diffusers release notes recommend for that Diffusers version. [3]
Sources: Diffusers installation docs v0.36.0 [1]; Diffusers issue documenting peft>=0.17.0 requirement on dev installs [2]; PEFT 0.18.0 release notes on Transformers v5 + Python requirement [3]; dependency note on transformers requiring huggingface-hub<1.0 [4].
🏁 Script executed:
# Search for requirements files and dependency specifications in the project
find . -type f \( -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "constraints*.txt" \) | head -20Repository: transformerlab/transformerlab-app
Length of output: 216
🏁 Script executed:
# Check the setup.sh file and surrounding context
cat -n api/transformerlab/plugins/diffusion_trainer/setup.shRepository: transformerlab/transformerlab-app
Length of output: 482
🏁 Script executed:
# Look for other setup or installation documentation
find . -type f \( -name "*.md" -o -name "*.txt" \) | xargs grep -l "torch\|diffusers\|peft" 2>/dev/null | head -10Repository: transformerlab/transformerlab-app
Length of output: 302
🏁 Script executed:
# Check the actual pyproject.toml for dependency management
cat -n api/pyproject.toml | head -100Repository: transformerlab/transformerlab-app
Length of output: 3219
🏁 Script executed:
# Search for torch/base plugin setup references
rg -n "torch|base.plugin|base_plugin|plugin.*venv" --type py --type toml --type sh api/Repository: transformerlab/transformerlab-app
Length of output: 50389
🏁 Script executed:
# Check if there's a parent/base setup script
find api/transformerlab/plugins -name "setup.sh" -o -name "base*.sh" | xargs cat -nRepository: transformerlab/transformerlab-app
Length of output: 19659
🏁 Script executed:
# Check if other plugins use version pinning or rely on project levels
rg -A2 "uv pip install" api/transformerlab/plugins/*/setup.sh | grep -E "(peft|diffusers|transformers)" | head -30Repository: transformerlab/transformerlab-app
Length of output: 2168
🏁 Script executed:
# Verify the actual pyproject.toml peft versions for all platforms
grep -A5 "peft==" api/pyproject.tomlRepository: transformerlab/transformerlab-app
Length of output: 529
Increase peft version constraint to >=0.17.0 for compatibility with diffusers 0.36.0.
The specified peft>=0.15.0 is below the minimum required by diffusers 0.36.0 (which requires peft>=0.17.0 per Hugging Face release notes). This conflicts with other diffusion-based plugins in the codebase (llama_trainer, image_diffusion, fastchat_server) which correctly use peft>=0.17.0. Additionally, the project-level pyproject.toml pins peft to 0.14.0 (nvidia/cpu) and 0.15.2 (rocm), which will force a downgrade and create a version mismatch at runtime.
Suggested fix
uv pip install "peft>=0.17.0" diffsynthVerify this is also addressed in the project-level pyproject.toml optional dependencies.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/transformerlab/plugins/diffusion_trainer/setup.sh` around lines 3 - 7,
Update the PEFT requirement from "peft>=0.15.0" to "peft>=0.17.0" in the shell
install line (replace the existing uv pip install "peft>=0.15.0" diffsynth
command with uv pip install "peft>=0.17.0" diffsynth) and also adjust the PEFT
version constraint in the project-level pyproject.toml optional dependencies
entries so they no longer pin to 0.14.0/0.15.2 but allow >=0.17.0, ensuring
consistency with diffusers 0.36.0 and other diffusion plugins.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/transformerlab/plugin_sdk/plugin_harness.py`:
- Around line 110-119: The parameter target_env_var in set_config_env_vars is
declared Optional but later used as a dict key in os.environ[target_env_var],
which will raise if None; update set_config_env_vars to either require
target_env_var (change its type to str) or add a guard at the top of the
function (e.g., raise ValueError or return early if target_env_var is None)
before calling get_db_config_value and setting os.environ, referencing the
function name set_config_env_vars and the os.environ assignment to locate and
fix the issue.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/transformerlab/plugins/diffusion_trainer/setup.sh`:
- Around line 1-9: The install script exits successfully even if the core
dependency install fails; update the shell script (setup.sh) to fail fast by
enabling strict error handling (add set -euo pipefail at the top) or by
appending explicit failure checks to the uv pip install invocation (e.g., ensure
the "uv pip install" command in the script will exit non‑zero on failure and
propagate that by using || exit 1). Target the top of the script and the "uv pip
install" line to ensure dependency install failures surface immediately.
|
I fixed this and tested on my azure vm |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
api/transformerlab/plugins/image_diffusion/main.py (1)
4-5: Consider centralizing model-reference helpers to avoid drift.
These utilities mirror the ones inapi/transformerlab/plugins/image_diffusion/diffusion_worker.py; extracting them into a shared module would reduce duplication and keep behavior consistent.Also applies to: 327-474
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/plugins/image_diffusion/main.py` around lines 4 - 5, The model-reference helper functions duplicated between image_diffusion main and diffusion_worker should be extracted into a single shared module (e.g., model_reference_helpers) and both modules should import those helpers instead of keeping separate copies; locate the duplicated utilities in image_diffusion/plugins/image_diffusion/main.py and image_diffusion/plugins/image_diffusion/diffusion_worker.py, move the helper definitions into the new shared module, update both files to import the helpers (removing the local copies and any redundant imports like inspect/Path if no longer needed), and run/adjust any unit tests or usage sites to ensure the unified helpers' API matches prior behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@api/transformerlab/plugins/image_diffusion/diffusion_worker.py`:
- Around line 12-221: The helper utilities in diffusion_worker.py (functions
_is_probable_hf_repo_id, _extract_hf_repo_from_model_metadata,
resolve_diffusion_model_reference, filter_generation_kwargs_for_pipeline)
duplicate logic from main.py; refactor by extracting these helpers into a single
shared module (e.g., a utilities or helpers module) and replace the local
implementations with imports from that module, updating any local references
(including imports used inside _extract_hf_repo_from_model_metadata such as
ModelService/asyncio) to use the centralized implementation so there is a single
source of truth and no duplicated code.
---
Nitpick comments:
In `@api/transformerlab/plugins/image_diffusion/main.py`:
- Around line 4-5: The model-reference helper functions duplicated between
image_diffusion main and diffusion_worker should be extracted into a single
shared module (e.g., model_reference_helpers) and both modules should import
those helpers instead of keeping separate copies; locate the duplicated
utilities in image_diffusion/plugins/image_diffusion/main.py and
image_diffusion/plugins/image_diffusion/diffusion_worker.py, move the helper
definitions into the new shared module, update both files to import the helpers
(removing the local copies and any redundant imports like inspect/Path if no
longer needed), and run/adjust any unit tests or usage sites to ensure the
unified helpers' API matches prior behavior.
| # This avoids resolver-driven torch/torchvision drift (e.g. missing torchvision::nms). | ||
| uv pip install \ | ||
| "diffusers==0.36.0" \ | ||
| "transformers==4.57.1" \ |
There was a problem hiding this comment.
This is the standard diffusers and transformers version we are using. You shouldn't have to put this in here?
Or are you saying that you want to upgrade peft but doing so changes diffusers and transformers version? If so, do we just upgrade peft in pyproject.toml?
Or is it because of diffsynth?
There was a problem hiding this comment.
🧹 Nitpick comments (1)
api/transformerlab/plugin_sdk/plugin_harness.py (1)
110-123: Log message doesn't distinguish global config from team config.Line 121 prints "user" or "team", but when both
user_idandteam_idare None, the config is actually sourced from global scope. Consider updating for accuracy:- print(f"Set {target_key} from {'user' if user_id else 'team'} config") + source = "user" if user_id else ("team" if team_id else "global") + print(f"Set {target_key} from {source} config")Similarly for line 123.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/plugin_sdk/plugin_harness.py` around lines 110 - 123, In set_config_env_vars, the log messages use {'user' if user_id else 'team'} which is incorrect when both user_id and team_id are None (global config); compute a source string like source = 'user' if user_id else 'team' if team_id else 'global' and use that source variable in both the success print (after setting os.environ[target_key]) and the exception warning so logs correctly show 'user', 'team', or 'global'; reference set_config_env_vars, target_key, get_db_config_value and os.environ when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@api/transformerlab/plugin_sdk/plugin_harness.py`:
- Around line 110-123: In set_config_env_vars, the log messages use {'user' if
user_id else 'team'} which is incorrect when both user_id and team_id are None
(global config); compute a source string like source = 'user' if user_id else
'team' if team_id else 'global' and use that source variable in both the success
print (after setting os.environ[target_key]) and the exception warning so logs
correctly show 'user', 'team', or 'global'; reference set_config_env_vars,
target_key, get_db_config_value and os.environ when making the change.
|
Diffusion works for me but training doesn't. I'm getting this error about FA3 in xformers for some reason. Let's make a decision at scrum |
…ab-app into add/z-image-ft
There was a problem hiding this comment.
Overall, I don't think this is usable for anyone with a smaller gpu. It crashes for me on a dataset of 5 images with 24gb VRAM. I was able to make FLUX run with sharding but instead of sharding, the easiest thing to do here is look at the VRAM management section here along with other recommendations: https://github.com/modelscope/DiffSynth-Studio/blob/main/docs/en/Model_Details/Z-Image.md
I would recommend we don't do any changes on this anymore. Lets close this and remake everything such that it just clones DiffSynth studio and executes their scripts directly. We can make a new PR after we move to local providers?
Tagging @dadmobile here for further opinions
| from typing import Optional | ||
|
|
||
|
|
||
| def get_db_config_value(key: str, team_id: Optional[str] = None, user_id: Optional[str] = None) -> Optional[str]: |
There was a problem hiding this comment.
Why did you remove the import from transformerlab.plugin and add the function here directly? Was there an issue?
| if "ncclCommShrink" in str(e): | ||
| print( | ||
| "Detected CUDA/NCCL mismatch while importing torch. " | ||
| "Reinstall the plugin venv with a torch build matching this machine's CUDA runtime." |
There was a problem hiding this comment.
We should never face this issue since we do the base install right?
| return None | ||
|
|
||
|
|
||
| def resolve_diffusion_model_reference(model: str) -> str: |
There was a problem hiding this comment.
I dont think we need/support this right now? diffusers itself requires model_index.json
| # cache_key = get_pipeline_key(model, adaptor, is_img2img, is_inpainting) | ||
|
|
||
| with _PIPELINES_LOCK: | ||
| resolved_model = resolve_diffusion_model_reference(model) |
There was a problem hiding this comment.
We wouldnt need resolving right because the plugin_harness provides all info correctly and you wouldnt reach this stage if something was unresolved
|
OK agreed. @ParamThakkar123 I know you did a tonne on this PR but let's take what we did from this and instead focus on making a task with DiffSynth on the new style tasks. |
Summary by CodeRabbit
New Features
Chores
Refactor
Tests